feat(e2e): agentic verification loop with MCP Playwright browser layer by informatico-madrid · Pull Request #128 · tzachbon/smart-ralph

informatico-madrid · 2026-04-01T17:06:20Z

Summary

This PR adds an agentic verification loop to Smart Ralph: the agent reads a user story, reasons about what "working" looks like, explores the system creatively, and self-corrects when something breaks — without scripted steps or Gherkin.

All additions are purely additive and live inside plugins/ralph-specum/. Zero changes to core flow, existing commands, or stop-watcher behavior.

Problem

Classical test coverage strategies were designed for humans writing tests for humans to read. When an agent writes and executes those same tests:

Tests pass but don't verify real behavior (mock-only anti-patterns)
High coverage numbers give false confidence
Nobody maintains the test suite because the agent keeps rewriting it
No self-correction when something breaks mid-spec

The root issue: classical testing is a human artifact, not a verification mechanism designed for agentic loops.

The Idea

Give the agent a contract of what to observe and a license to explore — not a script to follow.

We call this the Verification Contract: a lightweight section in each requirements.md that tells the agent what to observe, not how to test.

What's Added

1. Verification Contract in specs (`templates/requirements.md`, `agents/product-manager.md`)

Each spec's requirements.md gets a ## Verification Contract section:

## Verification Contract

**Entry points**: routes, endpoints, or UI surfaces this story touches
**Observable signals**: what PASS looks like (HTTP status, visible element, persisted data)
**Hard invariants**: what must NEVER break (auth, permissions, adjacent flows)
**Seed data**: minimum system state needed to verify
**Dependency map**: other specs/modules that share state with this one
**Escalate if**: conditions that require human judgment

product-manager is updated with guidelines to populate this from user stories.

2. Exploratory `qa-engineer` (`agents/qa-engineer.md`)

New [STORY-VERIFY] mode: given a user story + Verification Contract, the agent derives and executes checks autonomously. No Gherkin. No scripted steps. Emits structured signals: VERIFICATION_PASS, VERIFICATION_FAIL, FINDING.

Example checks derived from a single story ("filter invoices by date"):

Does the filter actually filter, or just visually sort?
Does the filter state persist across page reload?
Does it respect the user's timezone?
Can it be combined with other filters?

None of these come from a script. They come from reasoning about intent.

3. Repair loop (`hooks/scripts/stop-watcher.sh`)

When VERIFICATION_FAIL is detected:

→ classify failure (impl_bug / env_issue / spec_ambiguity / flaky)
→ impl_bug: backtrack to originating task, apply targeted fix
→ rerun verification for that story only
→ pass: continue to regression sweep
→ fail again after 2 iterations: escalate to human

New .ralph-state.json fields: repairIteration, failedStory, originTaskIndex.

4. Regression sweep (`hooks/scripts/stop-watcher.sh`)

After ALL_TASKS_COMPLETE, reads **Dependency map** from the completed spec and runs targeted [STORY-VERIFY] sweeps on dependent specs only. Three tiers: Local → Invariants → Full (nightly).

5. MCP Playwright browser layer (`plugins/ralph-specum/skills/e2e/`)

Four skill files covering the full browser verification protocol:

Skill	Purpose
`playwright-env.skill.md`	Project config: app URL, auth mode, token keys, seed command
`mcp-playwright.skill.md`	Full `@playwright/mcp` protocol: tool selection, verification sequence, devtools tracing, PASS/FAIL/DEGRADED/ESCALATE signals
`playwright-session.skill.md`	Session lifecycle: open → auth → verify → close. Auth branching by mode (cookie / storage-state / token / form). Lock recovery.
`ui-map-init.skill.md`	Living selector registry. MCP-first exploration (Step 1A), static analysis fallback (Step 1B). Incremental updates by spec-executor and qa-engineer.

task-planner auto-injects a VE0 task (ui-map-init prerequisite) before any spec that uses Playwright. Zero human memory required.

Key design decisions:

Browser is one signal layer, not the foundation. CLI + HTTP + Browser + Logs.
Agent never launches or kills the MCP server. Human-configured in MCP client (--isolated --caps=testing).
No auto-install. If @playwright/mcp is missing → VERIFICATION_DEGRADED + escalate to human.
ui-map.local.md is gitignored. Local to each developer, grown incrementally by the agents that use it.

Files Changed

New files:

plugins/ralph-specum/skills/e2e/playwright-env.skill.md
plugins/ralph-specum/skills/e2e/mcp-playwright.skill.md
plugins/ralph-specum/skills/e2e/playwright-session.skill.md
plugins/ralph-specum/skills/e2e/ui-map-init.skill.md
plugins/ralph-specum/skills/e2e/e2e-verify-integration.skill.md
plugins/ralph-specum/skills/e2e/homeassistant-selector-map.skill.md
FORK_GOALS.md

Modified files (additive only):

templates/requirements.md — added ## Verification Contract section
agents/product-manager.md — guidelines to populate Verification Contract
agents/qa-engineer.md — [STORY-VERIFY] mode
agents/task-planner.md — VE task auto-injection, VE0 prerequisite
agents/spec-executor.md — skill load order for VE tasks, session end reminders, ui-map patch after data-testid changes
agents/architect-reviewer.md — mandatory test strategy with mock rules
plugins/ralph-specum/hooks/scripts/stop-watcher.sh — repair loop + regression sweep
.gitignore — **/ui-map.local.md

What This Is NOT

Not a Gherkin/BDD framework
Not a replacement for qa-engineer (extends it)
Not a browser-testing tool (browser is one signal layer)
Not a full rewrite (every change is small and additive)

Summary by CodeRabbit

New Features
- Added E2E verification layer with Playwright browser testing integration
- Introduced automated UI Map for selector discovery and management
- Added Verification Contract section in requirements for defining test observables
- Implemented repair loop for automated issue diagnosis and retries
- Added regression sweep to verify dependent specs after changes
Documentation
- Updated requirement and design templates with verification guidance
- Added Playwright selector strategy and configuration examples
- Expanded spec workflow to include project type classification

…ore and FORK_GOALS - Rename selector-map.skill.md → homeassistant-selector-map.skill.md (HA-specific examples, reusable as reference for other projects) - Add ui-map-init.skill.md: agnostic protocol for generating ui-map.local.md in any project (runs once per project, output is gitignored) - Add **/ui-map.local.md to .gitignore - Update FORK_GOALS.md Phase 0.1: remove coupled selector hierarchy from description, reference new filenames, stay generic

…meassistant-selector-map.skill.md)

- .gitignore: add **/ui-map.local.md (project-specific selector map, never committed) - FORK_GOALS.md Phase 0.1: update filenames, remove coupled selector hierarchy from description, stay domain-agnostic

…right

…uct-manager + exploratory qa-engineer

…d degradation

…on strategy

…ng version

…notes

…-playwright skills - Add playwright-env.skill.md: resolves appUrl, authMode, credentials refs, seed data, browser config and safety limits before any browser tool call. Emits ESCALATE if critical context is missing instead of guessing. - Add playwright-env.local.md.example: full template covering all auth modes (none, form, token, cookie, basic, oauth, storage-state). Gitignored by design. - Update mcp-playwright.skill.md (v1→v2): add Step -1 that loads playwright-env before the dependency check. Add anti-pattern entry for skipping Step -1. - Update playwright-session.skill.md (v1→v2): reads appUrl/authMode/browser config from .ralph-state.json→playwrightEnv instead of hardcoded values. Add explicit auth flow per mode (form, token, cookie, basic, storage-state, oauth/sso). Add ESCALATE rule for oauth/sso without pre-auth session. README.fork.md already references playwright-env.skill.md and playwright-env.local.md.example — both now exist.

- .gitignore: add playwright-env.local.md and .ralph-auth-state.json to prevent accidental commit of local auth config and storage-state files. - playwright-env.skill.md (v1→v2): add explicit connectivity check step after appUrl is resolved. Uses curl with 5s timeout before writing playwrightEnv to state. Emits ESCALATE(app-not-reachable) if check fails. Checklist item updated from vague 'reachable' note to concrete step ref.

…w-5 seed order, yellow-6 stable-state timeout, white-7 jq race note - playwright-session.skill.md (v2→v3): - orange-3: expand token auth section with 3 concrete injection patterns (localStorage, Authorization header, cookie fallback) so agent never improvises. Add tokenBootstrapRule field to playwright-env.local.md format. - yellow-5: clarify seedCommand execution order — runs in playwright-env AFTER connectivity check, BEFORE writing playwrightEnv to state. - yellow-6: replace vague 'stable state' with concrete criterion: call browser_snapshot and check no [aria-busy] or loading indicators in tree; retry once after 1000ms if found. Document the tool to use. - ui-map-init.skill.md (v1→v2): - orange-4: add Step -1 (load playwright-env) before Step 0, matching the pattern in mcp-playwright.skill.md. Stops if ESCALATE received. - playwright-env.skill.md (v2→v3): - yellow-5: add seedCommand execution step between connectivity check and writing playwrightEnv to state. Only runs on local/staging. - mcp-playwright.skill.md (v2→v3): - white-7: add note in Step 0 about jq write race condition in parallel execution scenarios. Recommend basePath-scoped lock or sequential VE tasks.

@latest

Critical fixes: - playwright-session: fix token Pattern A (localStorage inject before goto fails on blank origin) - playwright-env: wrap RALPH_SEED_COMMAND in eval + quotes to handle args/spaces - mcp-playwright: replace npx @latest (downloads) with --no-install check + lock recovery Cache & isolation: - mcp-playwright: add RALPH_PLAYWRIGHT_ISOLATED setting, --isolated flag, lock recovery protocol - playwright-session: add Session End lock-file cleanup on close failure - playwright-env: add RALPH_PLAYWRIGHT_ISOLATED to Settings Reference + tokenLocalStorageKey Minor: - mcp-playwright: clarify snapshot-only vs screenshot-always contradiction - playwright-session: add CAPTCHA/2FA ESCALATE detection in form auth - playwright-env: add TTL warning for state fallback from previous session"

…andling (#4 #9)

…on, mcp context reset, auth table header - ui-map-init: fix date -r (Linux only) → portable stat fallback for macOS - mcp-playwright: Step 0b — run always when isolated=false, not conditionally - playwright-session: clarify MCP server restart required between sessions - playwright-env: fix broken markdown table header in Authentication section

…sk numbering contract, spec-workflow e2e bridge - reality-verification: add project-type detection (api-only/cli/library skip Playwright entirely) - reality-verification: document VE task numbering contract (VE0 = ui-map-init, 4.3 = verify fix) - spec-workflow: reference e2e skills in implement phase, add project-type to requirements phase activities

… without issue ref v0.4.0

…iewer v0.3.0

…ion tree v0.3.1

mcp-playwright MUST load before playwright-session — session start reads .ralph-state.json → mcpPlaywright which is only written by mcp-playwright Step 0. Loading playwright-session first causes it to find the key absent and fall into degraded mode incorrectly. Correct order (matches spec-workflow/SKILL.md and phase-transitions.md): 1. playwright-env 2. mcp-playwright 3. playwright-session 4. ui-map-init (VE0 only)

…fig, not agent commands The skill was documenting npx launch commands as if the agent executes them directly. In Claude MCP context the agent never launches the server — the human configures it in their MCP client (claude_desktop_config.json or equivalent). The agent only calls browser_* tools that the already-running server exposes. - Remove Protocol A section implying agent runs npx to start server - Add human-config note: --isolated and --caps=testing must be set in the MCP server definition, not invoked by the agent at runtime - Preserve isolation mode documentation as context for the agent (so it knows what behaviour to expect from the server)

…/npx) The MCP server is a long-running process managed by the human in their MCP client config. The agent never launches or kills it. - Remove Steps 4 & 5 from Session Start (pkill + npx launch) - Remove "Stop the MCP server process" from Session End - Remove pkill from Session End abnormal-termination block - Context Isolation: replace "Restart MCP server" with ESCALATE note - Cleanup Checklist: remove server-process check; agent responsibility ends at browser_close + lock recovery Consistent with mcp-playwright.skill.md v6 (human-config note).

…xt myth, resolvedAt timestamp, ui-map session clarity, spec-executor session end Issues fixed: - #2 🔴 playwright-session Start Steps 4/5/6: auth sequence broken for cookie/storage-state. cookie and storage-state inject BEFORE any navigation; form goes to loginUrl not appUrl; token (Pattern A) navigates to appUrl then injects. Replaced unconditional Step 4 with authMode-conditional branching table. - #1 🟠 playwright-session Step 4: removed false claim 'MCP server creates a fresh context on each navigation'. Added explicit note: browser_navigate does NOT reset state within a session. Isolation comes from --isolated flag at server startup, not per-navigation. - #3 🟡 playwright-env Write State: added resolvedAt ISO timestamp so staleness warning in Resolution Order point 3 is actually computable by the agent. - #4 🟠 ui-map-init: added explicit note that VE0 opens its own browser session (does not reuse any prior session) and must follow Session End from playwright-session when done. - #6 🟠 spec-executor: added explicit Session End reminder after VE task completion so sessions are not leaked between consecutive VE tasks.

…row for multiple VE tasks spec-executor is the authority on session policy between VE tasks. The "Multiple VE tasks in same spec: Same context OK" row contradicted the mandatory Session End rule in spec-executor. Removed to eliminate the ambiguity. Added a clarifying note that "during" scope is sub-steps within a single VE task, not across VE tasks.

- Replaced vitest (Node.js) runner with bats (Bash test framework) - Updated Test Discovery section with bats commands - Updated Mock Boundary table with bash-appropriate mocks - Updated Concurrent writes row with background subshells pattern - Verified: grep vitest returns CLEAN Co-Authored-By: Claude Opus 4.6 <[email protected]>

- design.md: added bash to ACK message block, text to CLOSE Thread Example - requirements.md: added text to FR-2 message format example - Note: line 289 and 306 had no bare fences in current file state Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ence detection, human as participant Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ents Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

…e bug, inconsistencies, reviewer improvements Co-Authored-By: Claude Opus 4.6 <[email protected]>

…-executor, format fixes - C1: spec-executor.md — add flock-based atomic write (was bare cat >>) - C2: requirements.md — fix NFR-1 vs FR-13 contradiction (clarify flock for chat.md, temp+rename for state) - M3/M4: BLOCK→HOLD in spec-executor.md and external-reviewer.md (5+ references) - M5: chat.md — real entries now follow canonical ### [writer → addressee] format - M8: design.md — remove language tag from closing backticks - N1: templates/chat.md — add text language to fenced block - N5: chat.md (spec instance) — add text language to fenced block - N8: .progress.md — mark temp+rename note as superseded - N11: tasks.md — rename lastReadIndex→lastReadLine globally - N13: task_review.md — remove duplicate timestamp in resolved_at Co-authored-by: Qwen-Coder <[email protected]>

… progress.md typo - N2: index-state.json — 'complete' → 'completed' for ralph-quality-improvements - N3: index.md — empty status → 'done' for ralph-quality-improvements - M6: research.md — clarify two activation thresholds (existence vs existence + 1 message) - N7: .progress.md — 'docs/agen-chat/' → 'docs/agent-chat/' Co-authored-by: Qwen-Coder <[email protected]>

Feat/agent chat protocol

Add mandatory chat.md check step between task parsing and delegation. The coordinator now reads new messages from chat.md (after lastReadLine), blocks on HOLD/PENDING, responds to OVER, and announces each task before delegating — activating the bidirectional protocol already defined in spec-executor.md and external-reviewer.md.

The Chat Protocol section already defined the correct bidirectional behavior but was disconnected from the numbered Task Loop. Add step 2a so the executor reads chat.md on every iteration BEFORE checking task_review.md (step 2b), mirroring the pilot callout pattern now enforced in coordinator-pattern.md.

Expand Step 4 to cover all signals defined in external-reviewer.md: - INTENT-FAIL: log + wait 1 cycle before delegating - DEADLOCK: hard stop, surface to human - URGENT: treat as HOLD - CONTINUE: no-op, proceed - ALIVE / STILL: heartbeats, ignore Previous commit only handled HOLD, PENDING and OVER.

… layer, EXECUTOR_START as Layer 0 Changes from V2 (agent self-fix): - Role Definition: add 3 lines making chat.md usage explicit in Integrity Rules - Chat Protocol: add Step 2b — read task_review.md before delegating (defense-in-depth) - Verification: add Layer 2b anti-fabrication (independent verify command execution) - Update all 'all 3 layers' references to 'all 4 layers' Changes from V3 (external analysis): - Reposition Step 2b as standalone section BEFORE the Chat Protocol <mandatory> block - Add defense-in-depth comment explaining intentional duplication vs spec-executor - Integrate EXECUTOR_START verification as Layer 0 (was floating section, now numbered blocker) - Update Verification Summary to list all 5 layers (0+4)"

…s.md unmark G4 — Section 0 Bootstrap: add step to read chat.md and check for active HOLD/PENDING/DEADLOCK signals before starting the Review Cycle. Prevents reviewer from starting blind when a conversation is already in progress. G2 — Section 6b unmark: wrap tasks.md demark write in flock exclusive lock (same pattern as chat.md atomic append) to prevent race condition with coordinator reading tasks.md to advance taskIndex concurrently."

Add TASK_AMBIGUOUS pre-execution signal (inspired by ChatDev dehallucination communicative pattern) to break the retry loop caused by ambiguous task blocks: spec-executor.md: - New section 'Ambiguity Detection (Pre-Execution)' before Task Loop - Criteria for emitting TASK_AMBIGUOUS: contradictory instructions, missing required context, impossible constraints, undefined references - Output format mirrors TASK_MODIFICATION_REQUEST (structured, parseable) - Guard: max 1 TASK_AMBIGUOUS per task to prevent clarification loops coordinator-pattern.md: - New handler in 'After Delegation' for TASK_AMBIGUOUS output - Coordinator enriches task block with clarification and re-delegates - Does NOT increment taskIteration (ambiguity is spec error, not execution error) - Logs clarification applied to .progress.md for auditability - Max 2 clarification rounds per task before escalating to human channel-map.md (new): - Documents all shared filesystem channels, writers, readers, timing - Explicitly marks channels with multiple writers (race condition risk) - Explains locking strategy per channel - Reference document for future protocol decisions and new agent onboarding

…lls for VE tasks - Added detailed E2E / VE task review section in external-reviewer.md to enforce proper review protocols. - Updated spec-reviewer.md to include a new E2E review rubric for better evaluation of VE tasks. - Mandated inclusion of `Skills:` metadata in VE tasks within task-planner.md to ensure necessary skills are loaded. - Revised coordinator-pattern.md to enforce strict anti-patterns for VE tasks and required skills. - Enhanced phase-rules.md with specific requirements for VE2 tasks to ensure comprehensive user flow verification. - Updated verification-layers.md to differentiate artifact type instructions for VE/E2E tasks. - Introduced a new SKILL.md for E2E to outline the skill suite for end-to-end testing. - Expanded homeassistant-selector-map.skill.md with native navigation documentation for Home Assistant. - Enhanced playwright-session.skill.md with a mandatory unexpected page recovery protocol. - Updated tasks.md template to require skills for all VE tasks and clarified task structures.

…s and improve clarity in task templates

Fixes applied (7 real issues from 18 comments): - tasks.md: add playwright-session to VE0 Skills (both POC/TDD blocks) - tasks.md: add selector-map to VE2 Skills (both POC/TDD blocks) - task-planner.md: align Skills example to **Skills**: bold format + fix dup </mandatory> - external-reviewer.md: resolve tool-permissions contradiction (allow post-task test exec) - homeassistant-selector-map.skill.md: fix false 'English across locales' claim, use data-panel-id for sidebar items instead of localized getByRole names - spec-reviewer.md: simplify 'No fixed waits' FAIL to match absolute PASS prohibition - coordinator-pattern.md: update 3 'Required Skills' refs to 'Skills:' metadata field False positives (not changed): - VE3 platform_skills: cleanup task doesn't need platform-specific skills - artifactType header in verification-layers: style preference, inline approach works - || table syntax in playwright-session: valid Markdown single-pipe tables - [VERIFY] in submode detection: layered design already handles this correctly - MD040 fence warnings: agent instruction files, not user-facing rendered docs

…-chat-coordinator feat(e2e): enhance verification processes and introduce mandatory ski…

Update fork with latest changes from tzachbon/smart-ralph

…er signals in QA process

…detection and handling in agents

…s en agentes

…entes external-reviewer y qa-engineer

Improve flat flow

informatico-madrid added 30 commits April 1, 2026 02:53

refactor(skills): remove generic selector-map.skill.md (renamed to ho…

c81feae

…meassistant-selector-map.skill.md)

docs: update .gitignore and FORK_GOALS for renamed skills

6e56bc9

- .gitignore: add **/ui-map.local.md (project-specific selector map, never committed) - FORK_GOALS.md Phase 0.1: update filenames, remove coupled selector hierarchy from description, stay domain-agnostic

feat(task-planner): inject ui-map-init before first Playwright E2E task

3c62d34

docs: add Phase 0.2 — task-planner auto-injects ui-map-init for Playw…

0ba45ab

…right

feat(phase1+2): verification contract in requirements template + prod…

7f74e21

…uct-manager + exploratory qa-engineer

feat(phase3+4): repair loop + regression sweep in stop-watcher

f5ecd5b

docs: mark Phase 3+4 complete in FORK_GOALS

3154a7c

feat(specum): Phase 5 - MCP Playwright skill with dependency check an…

8de254e

…d degradation

docs(fork-goals): mark Phase 5 complete, update status and contributi…

2dbf977

…on strategy

docs(skills): update legacy ui-map-init to point to Phase 5 supersedi…

36bbdf6

…ng version

docs: add README.fork.md documenting fork deltas and PR contribution …

15fc560

…notes

fix(ui-map-init): add map invalidation logic + protected-route auth h…

6869fb3

…andling (#4 #9)

feat(architect-reviewer): mandatory test strategy with mock rules v0.3.0

4effae6

feat(spec-executor): test writing guardrails — read strategy, no skip…

f82f845

… without issue ref v0.4.0

fix(templates): update design.md Test Strategy to match architect-rev…

dc0ef4d

…iewer v0.3.0

fix(spec-workflow): clarify quick mode reviewer, add --fresh to decis…

22c3df5

…ion tree v0.3.1

fix(spec-executor): document state cleanup on SPEC_COMPLETE v0.4.1

bd1811c

fix(spec-executor): handle VERIFICATION_DEGRADED from VE0 (v0.4.4)

1f53731

informatico-madrid and others added 30 commits April 8, 2026 16:42

feat(external-reviewer): add tool permissions, Judge pattern, converg…

6b3604e

…ence detection, human as participant Co-Authored-By: Claude Opus 4.6 <[email protected]>

chore(spec): update progress for task 5.13

a8e9818

chore(external-reviewer): bump version to 0.2.0 for reviewer improvem…

f259711

…ents Co-Authored-By: Claude Opus 4.6 <[email protected]>

chore: mark task 5.14 complete in tasks.md and .progress.md

d09f822

Co-Authored-By: Claude Opus 4.6 <[email protected]>

fix(agent-chat-protocol): address PR #9 review feedback — atomic writ…

24628b9

…e bug, inconsistencies, reviewer improvements Co-Authored-By: Claude Opus 4.6 <[email protected]>

chore(spec): final progress update for agent-chat-protocol

a205387

chore(progress): update progress and review log for task 5.15 completion

93722e3

Merge branch 'main' into feat/agent-chat-protocol

9272067

Merge pull request #9 from informatico-madrid/feat/agent-chat-protocol

8c0b66a

Feat/agent chat protocol

fix(spec-reviewer): update selector grounding criteria for E2E review…

456b757

…s and improve clarity in task templates

Merge pull request #10 from informatico-madrid/fix/e2e-implementation…

8ae6844

…-chat-coordinator feat(e2e): enhance verification processes and introduce mandatory ski…

Merge upstream/main into main

f901b44

Update fork with latest changes from tzachbon/smart-ralph

fix(.gitignore): add node_modules, .serena, and .qwen to ignore list

dc0f8f3

feat: add supervisor role and verification checks for external-review…

c1dcaf9

…er signals in QA process

feat: add SPEC-ADJUSTMENT and SPEC-DEFICIENCY signals, enhance error …

54adea4

…detection and handling in agents

feat: mejorar manejo de errores y ajustes en la verificación de tarea…

d748c31

…s en agentes

feat: mejorar la lógica de verificación y manejo de errores en los ag…

6a414d4

…entes external-reviewer y qa-engineer

Merge pull request #11 from informatico-madrid/improve-flat-flow

80a15b3

Improve flat flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(e2e): agentic verification loop with MCP Playwright browser layer#128

feat(e2e): agentic verification loop with MCP Playwright browser layer#128
informatico-madrid wants to merge 399 commits intotzachbon:mainfrom
informatico-madrid:main

informatico-madrid commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

informatico-madrid commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

The Idea

What's Added

1. Verification Contract in specs (templates/requirements.md, agents/product-manager.md)

2. Exploratory qa-engineer (agents/qa-engineer.md)

3. Repair loop (hooks/scripts/stop-watcher.sh)

4. Regression sweep (hooks/scripts/stop-watcher.sh)

5. MCP Playwright browser layer (plugins/ralph-specum/skills/e2e/)

Files Changed

What This Is NOT

Summary by CodeRabbit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

informatico-madrid commented Apr 1, 2026 •

edited

Loading

1. Verification Contract in specs (`templates/requirements.md`, `agents/product-manager.md`)

2. Exploratory `qa-engineer` (`agents/qa-engineer.md`)

3. Repair loop (`hooks/scripts/stop-watcher.sh`)

4. Regression sweep (`hooks/scripts/stop-watcher.sh`)

5. MCP Playwright browser layer (`plugins/ralph-specum/skills/e2e/`)